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ABSTRACT 

It is known that multiplication of linear differential opera- 
tors over ground fields of characteristic zero can be reduced 
to a constant number of matrix products. We give a new 
algorithm by evaluation and interpolation which is faster 
than the previously-known one by a constant factor, and 
prove that in characteristic zero, multiplication of differen- 
tial operators and of matrices are computationally equiva- 
lent problems. In positive characteristic, we show that dif- 
ferential operators can be multiplied in nearly optimal time. 
Theoretical results are validated by intensive experiments. 

Categories and Subject Descriptors: 
1.1.2 [Computing Methodologies]: Symbolic and Alge- 
braic Manipulation - Algebraic Algorithms 

General Terms: Algorithms, Theory 

Keywords: Fast algorithms, differential operators. 

1. INTRODUCTION 

Multiplication in polynomial algebras K[X] and K[X, Y] 
over a field K has been intensively studied in the computer- 
algebra literature. Since the discovery of Karatsuba's algo- 
rithm and the Fast Fourier Transform, hundreds of articles 
have been dedicated to theoretical and practical issues; see, 
e.g., [9, Ch. 8], [1], and the references therein. Not only 
are many other operations built upon multiplication, but 
often their complexity can be expressed in terms of the com- 
plexity of multiplication — whether as a constant number of 
multiplications or a logarithmic number of multiplications. 
In K[X], this is the case for Euclidean division, gcd and 
resultant computation, multipoint evaluation and interpola- 
tion, shifts, certain changes of bases, etc. 

In the noncommutative setting of linear ordinary differ- 
ential operators, the study is by far less advanced. The 
complexity of the product has been addressed only recently, 
by van der Hoeven in the short paper [11]: multiplication of 
operators over ground fields K of characteristic zero can be 
reduced by an evaluation-interpolation scheme to a constant 
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number C of matrix multiplications with elements in K. 
Work in progress [3] suggests that linear algebra is again 
the bottleneck for computations of GCRDs and LCLMs. 

This work aims at deepening the study started in [11] for 
characteristic 0. We improve van der Hoeven's result along 
several directions: We make the constant factor C explicit 
in §3.2 and improve it in §4, and we prove in §3 that multi- 
plication of matrices and of differential operators are equiv- 
alent computational problems — that is, they share the same 
exponent, thus answering the question left open in [11, §6, 
Remark 2] . As usual, those results hold for sufficiently large 
characteristic as well. We prolong the study to the case of 
(small) positive characteristic, by giving in §5 an algorithm 
for computing the product of two differential operators in 
softly quadratic complexity, that is, nearly optimally in the 
output size. This indicates that the equivalence result may 
fail to generalize to arbitrary fields. 

In what follows, the field K has characteristic zero, un- 
less stated otherwise. K[X]{9) and K[X]{^) respectively de- 
note the associative algebras M.{X,d;dX — Xd + 1) and 
K{X,6;eX = Xi6 + 1)). 





vdHe 


IvdHe 


vdH 


IvdH 


MulWeyl 


Product by blocks 


37 


24 


96 


48 


12 


Zeros -I- Strassen 


20 


8 


47 


12 


8 



Table 1: Number of ?i x ?i matrix products for multi- 
plication in M.[X]{9), resp. K[X]{d), in bidegree (n, n). 



Table 1 encapsulates our improvements on the constant C. 
It displays the cost of linear algebra in van der Hoeven's al- 
gorithms (vdHg, resp. vdH) and in the improved versions 
(IvdHs, resp. IvdH), which are described in §3.1, resp. §4.1, 
and in our algorithm (MulWeyl) in §4.2. The subscript 9 
refers to multiplication in K[A'](^); its absence means a 
product in K[X]{9). The first row provides bounds on the 
number of n x n matrix products used in each algorithm for 
multiplying operators in K[X](9), resp. K[X]{6), of degree 
at most n in X and in d, resp. 6, under the naive complex- 
ity estimate (1) below. This estimate reflects the choice of 
multiplying rectangular matrices by decomposing them into 
square blocks. The second row gives tighter bounds under 
the assumptions that: (i) any product by a zero block is dis- 
carded; (a) when possible, a product of two 2x2 matrices of 
n X n blocks is computed as 7 block products, instead of 8, 
by using Strassen's algorithm [14]; (in) predicted non-trivial 
zero blocks in the output are not computed. 



Canonical form and bidegree. In the algebra K[X]{9), 
resp. K[X](6), the commutation rule allows one to rewrite 
any given element into a so-called canonical form with X on 
the left of monomials and d, resp. 9, on the right, that 
is, as a linear combination of monomials X^d^ , resp. X'6\ 
for uniquely-defined coefficients from K. In either case, we 
speak of an element of bidegree (d, r), resp. at most (d, r), 
when the degree of its canonical form in X is d, resp. at 
most d, and that in d, resp. 6, is r, resp. at most r. With 
natural notation, the bidegree {dc,rc) of a product C = BA 
clearly satisfies rc = ta + tb and dc < dA + ds- 

The problem of computing the canonical form of the prod- 
uct of two elements of bidegree {d,r) from ]K[X](9), resp. 
from K[Jf](S), given in canonical form, is denoted {d,r)e, 
resp. {d,r)g. 

Complexity measures. AU complexity estimates are given 
in terms of arithmetical operations in K, which we denote 
"ops." We denote by Ce,Cs : N — > N two functions such 
that Problems (n,n)g and (n,n)g can be solved in Cg{n) 
and Ce(n), respectively. We denote by M : N — > N a func- 
tion such that polynomials of degree at most n in K[X] 
can be multiplied in M(n) ops. Using Fast Fourier Trans- 
form algorithms, M(n) can be taken in C(n log n) over fields 
with suitable roots of unity, and 0(71 log n loglogn) in the 
general case [13, 5]. We use the notation / G 0{g) for 
/, 5 : N ^ N if / is in ©(p log™ g) for some m > 1. For 
instance, M(n) is in 0{n). The problem of multiplying an 
m X n matrix by an n x p matrix is written {m,n,p). We 
let MM : N'' ^ N be a function such that Problem {m,n,p) 
can be solved in MM(m, n,p) ops. We use the abbreviation 
MM(n) for MM(n,n,?i). The current tightest (strict) up- 
per bound 2.376 for uo such that MM(n) £ 0{n'^) is derived 
in [7]. For the time being, this estimate is only of theo- 
retical relevance. Few practical algorithms with complexity 
better than cubic are currently known for matrix multiplica- 
tion, among which Strassen's algorithm [14] with exponent 
logj 7 ~ 2.807 and the Pan~Kaporin algorithm [12] with 
exponent 2.776. For rectangular matrix multiplication, we 
shall use the estimate 

UW{an,hn,cn) < ahcWM{n), for a, 6, c e N, (1) 

obtained by performing the naive product of a x & by 6 x c 
matrices whose coefficients are n x n blocks. 

Furthermore, we assume that M(n), MM(?i), Cgin), and 
Ce(n) satisfy the usual super-linearity assumption of [9, §8.3, 
Eq. (9)] and also that, if F(n) is any of these functions, then 
F(cn) belongs to C'(F(n)), for all positive constants c. 

Useful complexity results. Throughout, we shall freely use 
several classical results on the complexity of basic polyno- 
mial operations. They are encapsulated in Lemma 1. The 
corresponding algorithms are found in: [8, Algorithm E] 
for (a); [9, Chapter 10] for (b); [10, Th. 2.4 and 2.5] for (c); 
and [9, Cor. 8.29] for (d). 

Lemma 1 Let K be an arbitrary field. Let a £ K, let 
P{X) G K[X] be of degree less than n and f,g € K[X, Y] of 
degree at most d in X and n in Y . One can perform: (a) the 
Taylor shift Q{X) := P{X + a); (b) the multipoint evalua- 
tion and interpolation of P on a,a-\-l, . . . ,a-\-n if the char- 
acteristic of K. is or greater than n; (c) the base change 



between the monomial and the falling factorial basis {X)k = 
X{X - 1) ■ ■ ■ {X - k + 1) in 0{M{n) logn) ops. Moreover, 
one computes: (d) the product h = fg in 0(M(dn)) ops. 

2. NAIVE ALGORITHMS 

In this section, we provide complexity estimates for several 
known algorithms for {d,r)a. We set 

r d r d r 

^ = E E B = J2Y1 = E '^'(^)^"- 

1=0 j=0 1=0 j=0 1=0 

For any L = E[=o '"iX)d' = Ej=o ^'l'^^), we define 

dL _^ dh{X) i dL , dl'^{d) 
dX ^ dX ' dd ^ dd ' 



Naive expansion. The most naive calculation of BA is by 
expanding each d^X^ in the equality 

BA = ±Y^±Y^b.,a,,X^ {d^X')d\ 
i=o i=o fc=o i=o 

Using Leibniz's formula d^X^ = Er=o''''* (Ofe 
and the recurrences {l)k+\ = {l)kil-k) and {^^^) = (1)^^^, 
the canonical form of d'X'' is computed in 0(min(i, Z)) ops. 
This induces a complexity C(d^r^ min(d, r)) for comput- 
ing BA. The estimate simplifies to 0{n^) ii d — r = n. 

Iterative schemes. Another calculation is by the formula 

r 

Byl = E^«(^) ('9'^) (2) 

i=0 

and the observation that d^A has bidegree at most (d, r-\-i) 
and is computed from d^~^A in 0{dr) ops. by the identity 

dT 

dT = Td+— for T = r(X, d). (3) 
dX 

Therefore, the overall complexity is C'(M(d)r^ -\- dr^J = 
0(M(d)r2). When d = r = n, this is 0{M{n)n'^), and 
0(ri'^) if FFT is used. Similar considerations based on 

dT 

TX ^ XT + — for T = T{X, d) (4) 

provide an algorithm in 0(dP M(r)) , and one can always use 
the better algorithm by first comparing d and r. 

Another formula, attributed to Takayama and used in sev- 
eral implementations (Takayama's Kan system [15]; Maple's 
Ore_algebra by Chyzak [6]), is given by the (finite) sum 

o . V- 1 (d^'B d''A\ 
^^-E^(l^*rfX^j' (5) 

fc>0 ^ ^ 

where the products * are computed formally as commutative 
products between canonical forms, the resulting sum being 
viewed as a canonical form. Each of the derivatives has 
bidegree at most (d, r) and the derivative at order k can 
be computed in 0[dr) ops. from the one at order k — 1. 
The complexity is seen to be C'(min(d, r) M(dr)) ops., by 
Lemma 1(d). When d = r = n, this is 0[n M(n^)), or d(n^) 
using FFT; the scheme (2) is just a bit better than (5). 



3. EQUIVALENCE BETWEEN PRODUCTS 
OF MATRICES AND OPERATORS 

Let K be a field of characteristic zero. In [11], van der 
Hoeven showed that Ce(n) and Cd{n) are in C'(MM(n)) . 
When u < 3, this improves upon the algorithms in §2. 

In this section, we explain and improve this result along 
two directions: we make the constant factor explicit in the 
estimate Ce(n) £ C'(MM(n)), and lessen it. Then, we prove 
that {n,n,n), {n,n)g, and {n,n)g are equivalent computa- 
tional problems, in a sense made clear below. 

3.1 Product in K[X] {8} reduces to matrix prod- 
uct: van der Hoeven's algorithm revisited 

A differential operator A in K[X]{9) can be viewed as a 
K-endomorphism of K[X], mapping a polynomial / to A{f). 
As such, it is represented, with respect to the canonical basis 
{X^)i>o of K[X], by an (infinite) matrix denoted M^. The 
submatrix of consisting of its first r > 1 rows and c > 1 
columns is denoted M^^. 

Van der Hoeven's key observation is that an operator A 
of bidegree (d, r) is completely determined by the matrix 
:= MiV.+i,.+ i. Writing A = Eto E;=o and 
using the relation 0^ {X'') = X*" yields 



A{X'') 



^ai,,fc^X'+''- = X*'Y,Ai{k)X\ 



where the polynomials Ai are defined as Ai{X) — Ej=o "^ij^^ 
for all < i < d. Thus the matrix A/"^ has the following 
rectangular banded form: 



A)(o) 

ii(0) io(l) 



^(1) 



Mi) 



(6) 



Ad(0) : ■■• Ao{r) 
Mr) 

Mr) 

The knowledge of A is equivalent to that of all d + 1 poly- 
nomials Ai. Each of the latter having degrees bounded by r, 
this is also equivalent to the data of the values Ai{k), for 
< fc < r and < i < d. This is true by Lagrange inter- 
polation. Thus, A is indeed completely determined by the 
r -I- 1 polynomials A{X''), and also by the matrix M^. 

Now, let A,B € K[X]{6) and let C be BA. Then = 
M^M^. If A, B, and C have bidegrees (dA, ryi), (ds, rs), 
and {dc,rc), then the previous discussion implies the fol- 
lowing "finite version" of this matrix equality: 

which is the basis of the algorithm in [11], described below. 

Putting all these considerations together leads to Algo- 
rithm Mulfl in Fig. 1 and proves the following proposition. 

Proposition 1 Algorithm Mule in Fig. 1 reduces the com- 
putation of the product C — BA to the following tasks: 

(Tl) dA -|- 1 evaluations in degrees < rA on 0, 1, ... , rc; 



Mu\0{B,A) 

Input: A,B £ K[X]{e). 
Output: their product C — BA. 

1. Compute the Ai's and Bi's from A and B, then the 

matrices M^^^^^^i ^^^^^^^i and Ad^^^^^^i ^-^^i. 

2. Compute by Eq. (7). 

3. Compute the Ci's from M'~^ , then recover C. 



Figure 1: Product of differential operators in 9. 

(T2) ds-f 1 evaluations in degrees < rB on 0,1, ... , dA+rc; 
(T3) dc + 1 interpolations in degrees < rc on 0,1, ... ,rc; 

(T4) an instance of {dc + rc + l,dA + rc + l,rc + 1) ■ 

Proof. Eq. (6) shows that Step 1 in Algorithm Mule is 
performed by the evaluation Tasks (T1-T2). Similarly, the 
interpolation Task (T3) performs Step 3. Finally, the prod- 
uct in Step 2 is computed by (T4). □ 

We stress that the evaluation-interpolation scheme used 
in Algorithm Mu\g requires that the interpolation points 
0,1, ... ,rc be mutually distinct. Thus, this scheme would 
not have worked over a field of small characteristic, but 
would have remained valid in large enough characteristic. 

In the original article [11], Tasks (T1-T3) are performed 
by matrix multiplications, as explained in the next lemma. 

Lemma 2 Let d,r,s £N and let ao, . . . ,as be distinct points 
in K. Evaluating d+1 polynomials of degree r on the ai 's re- 
duces to an instance of {s-\-l, r-\-l, d-\-l} plus 0{sr) ops. In- 
terpolating d-f 1 polynomials of degree s on the Oi 's amounts 
to an instance of (s -f 1, s + 1, d -f 1) plus 0{s'^) ops. 

Proof. The omitted proof is based on grouping multipli- 
cations by Vandermonde matrices into a single product. □ 

Using Lemma 2, one immediately deduces the cost of van 
der Hoeven's algorithm "a la lettre" (vdHg); the following 
enumeration displays only the dominating costs, quadratic 
estimates like 0{rcrA) being intentionally neglected: 

1. MM(rc-hl,rA + l,dA + l) for (Tl); 

2. MM(dA + rc + l, re + 1, ds + 1) for (T2); 



3. 



MM{rc + l,rc + l,dc + l) for (T3); 



4. MM(dc rc 1, dA + rc + 1, rc -\- 1) for (T4). 

Notice that the last step dominates the cost. 

For Problem {n,n)e which is studied in [11], applying the 
estimate (1) leads to the number 2-f 3-f 2 ■ 2 • 2 + 4 ■ 3 • 2 = 37 
of n X n block multiplications given in column vdHg of Ta- 
ble 1. This estimate is however pessimistic and can be re- 
duced to 20: Strassen's formula reduces the 8 block prod- 
ucts in Task (T3) to 7; the band structure of the matrices in 
Task (T4) reduces 24 to only 8 products of non-zero blocks. 

A first improvement. Algorithm vdHe can be improved by 
making use of fast multipoint evaluation and interpolation of 
Lemma 1(b) to perform Steps 1 and 3 of Algorithm Mule in 
Fig. 1. This remark will be crucial in our proof of equivalence 
in §3.2. We arrive at the following complexity estimates: 



1. O {dAM{rc) log rc) for (Tl); 

2. 0{dBM(dA+rc)\og{dA + rc)) for (T2); 

3. 0{dcM{rc)\ogrc) for (T3). 

Assuming FFT is available for polynomial multiplication, 
the cumulated cost of Tasks (T1-T3) drops to 

6{dArc + ds(dA + rc) + dcrc) 6 6{dcrc + dAds). 

This cost is nearly optimal, since it is almost linear in the 
number of non-zero elements of the matrices involved in 
Eq. (7). In the particular case of problem {n,n)e, we ob- 
tain the numbers 24 and 8 of column IvdHg in Table 1. 

3.2 Matrix multiplication reduces to product 

in K[x]{e) 

In summary, the results of the previous section show that 
Ce(n) £ C'(MM(n)). Here we prove the converse statement, 
by proceeding in two steps. First, Lemma 3 shows that 
the multiplication, whose complexity is denoted T(n), of 
two lower-triangular matrices of size n x n reduces to the 
product of two operators of bidegree at most (n, n) in {X, 6). 
Secondly, Lemma 4 proves that multiplying two arbitrary 
matrices amounts to a constant number of products of lower- 
triangular matrices. 

Lemma 3 T(n) G Co (n) -f O (n M (n) log n) . 

Proof. Let Li, L2 be two (n-l-1) x (n-l-1) lower-triangular 
matrices. Denote (tij) and (sij) their coefficients, with 
< i,i < Let Bi{X) and Ai[X) be the (unique) polyno- 
mials in ^.[X] of degree at most n — £ that interpolate the 
elements of the ith lower diagonal of Li, resp. L2, on the set 
{0, 1, . . . ,n — i}. Specifically, for < £, j < n with £ + j < n, 
we have te+jj = Bi{j) and si+jj — Ae{j). Using fast in- 
terpolation, the computation of the polynomials Bt{X) and 
Ai{X), for < ^ < n, is done in 0(n M(n) logn) ops. Define 

A = 127=0 J:U<^i,jX'e^ and B = pi=oT.%lK.x'e^ 

from the coefficients in Ai(X) = X^J^q a.i,jX^ and Be{X) — 
H^Zo ^^,3-^^ ■ Let C = BA. Then, L\ and L2 are seen to be 
top- left blocks of and , and Eq. (7) with A replaced 
by C shows that the top-left (n -f 1) x (n H- 1) submatrix 
of M*^ is the lower-triangular matrix L1L2. This subma- 
trix is computed starting from the coefficients of C using 
0{n M(n) logn) ops., by fast multipoint evaluation. □ 

Lemma 4 MM(n) G 0{T(n)). 

Proof. Let M, A'' be n x n matrices. The identity 
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shows that MM([n/3]) < T(7i) G 0(T{n)) and the conclu- 
sion follows from the growth hypotheses on MM. □ 

Lemmas 3 and 4 imply the main result of this section. 

Theorem 1 There exists a constant K > such that 
MM(7i) < K[Cg{n) + nM{n)\ogn). 



3.3 Equivalence between product in K[X] (d) and 

in K[x]{e) 

Relax K to be a field of arbitrary characteristic. Any 
operator A = X^[^q in K[A]{S) with coefficients Qi of 
degree at most d can be expressed in the algebra K.[X] (d) as 
A — X]i=o with coefficients ai of degree at most d + r. 

As indicated in the proof of [4, Cor. 2], performing the 
conversion from the representation in 9 to the representation 
in d amounts to multiplying a Stirling matrix S of size r + 1 
by an {r + 1) x {d -\- 1) matrix containing the coefficients 
of the Qi's. This matrix product can be decomposed into 
d -I- 1 matrix- vector products of the form w = Sv. The 
coefficients of the vector w represent the coefficients of the 
polynomial '^^ViX^ in the falling factorial basis {X)k- As 
Lemma 1(c) holds for any characteristic, w can be computed 
using C'(M(r-) logr) ops. To summarize, the coefficients ai 
can be computed from the Oi's in 0[d M(r) logr) ops. 

Conversely, let B — "^l^gbid^ be in ]K[X](i9). It can be 
written in the algebra K[X, X~^]{9) of differential opera- 
tors in 6 with Laurent polynomial coefficients as follows: 
B = X^[^o ft^'- K the bi's have degrees bounded by d, then 
the /3i's have degrees at most d and valuation at least — r 
in X. A discussion similar to above shows that the computa- 
tion of the coefficients Pi from the coefficients bi amounts to 
multiplying the inverse of the Stirling matrix by an (r + l) x 
(d-|-r--|-l) matrix. This matrix product can be decomposed 
into d -I- r 4- 1 matrix- vector products by S~^; this amounts 
to expanding in the monomial basis d -f r -I- 1 polynomials of 
degree at most r given in the falling factorial basis. Thus, 
the conversion can be done in 0[{d + r) M(r) logr) ops. 

We encapsulate this discussion into the following result, 
which proves that {n,n)g and {n,n)g are computationally 
equivalent, up to 0{n'^) terms, in any characteristic. 

Theorem 2 There exist a constant C > such that 
Co (n) < C (Cs (n) -f n M (n) log n) , 
Ca(n) < C(Ce(n) +nM(n)logn), 
over fields of any characteristic. 

Proof. Let Ai,A2 be of bidegree (n, n) in K[X]{6). By 
the previous discussion, converting them into K[X](9) has 
cost ©(n M(n) log n) . Both A\,A2 have bidegrees at most 
(2n, n) in {X,d) and can thus be multiplied using Cs(2n) G 
C'(Cs(n)) ops. Converting the result back intoK[X](6') costs 
0(71 M(n) logn) ops. This proves the first inequality. 

Let now Bi and B2 be of bidegree {n,n) in M.[X]{d). 
Their conversion in K[X, X~^]{6) can be performed using 
0(n M(n) logn) ops. and produces two operators Ci and C2 
in K[A]{S), of bidegrees at most (2n,n) in {X,9) such that 
Bi = X-'^CiiX, 9) and B2 = A-"C2(X, 9). Using the com- 
mutation rule C2iX, 6»)X-" = X-"C2iX, 9 - n), we deduce 
the equality B2B1 = X-^'^C2{X,9 - n)Ci{X,9). Writing 
C2{X,9) = "^"^Q X-' c'j{9) shows that computing the coeffi- 
cients of C2 {X, 8 — n) amounts to n + 1 polynomial shifts in 
K.[9] in degree at most n. Each of these shifts can be com- 
puted in C'(M(n) logn) ops., using Lemma 1(a). Conversion 
of B2B1 back into K[X]{9) has the same cost. □ 

4. BETTER CONSTANTS IN K[X] {d) 

In §4.1, we revisit van der Hoeven's algorithm for {n,n}g 
and exhibit the constant factor in its 0(MM(n)) cost. Then, 
we propose in §4.2 a new algorithm with a better constant. 



4.1 Multiplication in ]K[x] (a) : van der Hoeven's 
algorithm revisited 

Van der Hoeven's algorithm for computing products in 
K[X](9) is based on the fact that his algorithm for prod- 
ucts in K[X](S) can be adapted to operators with Laurent 
polynomials coefficients. Indeed, to any L £ K\X, X^^]{6) 

of the form L — X^iL-i, H^j^o^'-':'-^'^'' ^® associated an infi- 
nite matrix representing the K-linear map of multiplication 
by L from K[X] to X-"K[X]. Its {v + d + r + 1) x {r + 1)- 
submatrix Mq ,. (defined shortly) is banded and it uniquely 
determines the operator L, as in the case of polynomial co- 
efficients. 

To be precise, for two integers a < /3 we denote by 
the {v + d+ 13 — a + 1) x {l3 — a + l) matrix whose (7 — a-f l)-th 
column, for a < 7 < /3, contains the coefficients of L{X'') 
on . . . , X''+'^. The matrix M^^p has a banded form 

and contains on its diagonals the evaluations on the points 
/3 of the polynomials , . . . ,Ld defined by Li (X) — 
Ei=o ^i.j^' for all -v<i<d. 

Let A,B have valuations —va,—vb and degrees dA,dB 
with respect to X, and degrees tajTb 9. If C = BA 
in K[X, X~^]{9) , then the following equality, analogous to 
Eq. (7), holds: 

Likewise, the product of operators in 1£\X, X~^]{6) re- 
duces to some evaluation and interpolation tasks (in or- 
der to convert between operators and matrices) and to the 
main matrix-multiplication task (8), which is an instance of 
{vc + dc + rc + l,VA + dA + rc + 1, rc + 1). 

The algorithm for multiplication in K[X]{9) based on mul- 
tiplication in 'K[X, X^^]{9) is described in Fig. 2 below. 



Mu\a{B,A) 

Input: A,B £ K[X]{d). 
Output: their product C — BA. 

1. Convert A, B in K[X,X-^]{9). 

2. Compute the product C ^ BA in K[X, X-^]{9): 

2.1 From A and B, compute the matrices 

2.2 Compute the matrix Mq^^^ using Eq. (8). 

2.3 Recover C bom M^^^^^. 

3. Convert C in K[X]{c)) and return it. 



Figure 2: Product of differential operators in d. 

In what follows, we treat in more detail the main case of 
interest, {n,n)g, as solved by Algorithm Mula in Fig. 2. Van 
der Hoeven suggests to perform Steps 1 and 3 using matrix 
multiplications by Stirling matrices and their inverses [11, 
§5.1, Eqs. (12-13)] and Steps 2.1 and 2.3 using matrix mul- 
tiplications by Vandermonde matrices and their inverses [11, 
§2 and §4]. The elements of all the needed Stirling and Van- 
dermonde matrices (and their inverses) can be computed 
using 0{n^) ops. A careful inspection of the matrix sizes 
involved in Algorithm Mula shows that: 

1. Step 1 reduces to 2 instances of (2n -|-l,n + l,n + l); 

2. Step 3 reduces to an instance of (4n + l, 2n-|-l, 2n-f 1); 

3. Step 2.1 reduces to an instance of {2n + 1, n-f 1, 4n-f 1) 
and an instance of (2n + l,n + l,2n + 1); 



4. Step 2.3 reduces to an instance of {47i+l, 2n+l, 2n+l); 

5. Step 2.2 reduces to an instance of (6n+l, 4n+l, 2n+l). 

This variant of the algorithm is what we call vdH. Using 
again the estimate (1) yields the constant 96 in Table 1. 

Several Improvements. A first improvement on vdH is to 
use fast multipoint evaluation and interpolation for Steps 

2.1 and 2.3. A second improvement concerns conversions 
back and forth between operators in ^[X, X~^]{d) and in 
K[X, X~^\{9) (Steps 1 and 3). Instead of using matrix prod- 
ucts by Stirling matrices and their inverses, one can apply 
Lemma 1(c), as explained in §3.3. Both improvements in 
conjunction with FFT lessen the cost of Steps 1, 2.1, 2.3, 
and 3 to a negligible (5(n^). We call this improved algo- 
rithm IvdH. Using (1) yields the constant 48 in column IvdH 
in Table 1. The constants 47 and 12 on the last row of the 
table are more technical and will be proved in [2]. They rely 
on observing that the output of IvdH requires partial calcula- 
tion of (8) , reducing to an instance of (4n -f 1 , 3?! -f 1 , 2n -I- 1) . 

4.2 A new, direct evaluation-interpolation al- 
gorithm 

Let A and B be in K[X]{9) with respective bidegrees 
[dA,rA) and {dB,rB). We give here an evaluation-inter- 
polation algorithm for computing C — BA which essentially 
reduces to {dc + l,dA + rc + 1, rc + 1) for those bidegrees. 

To achieve this, we interpret again a differential opera- 
tor P in K[X](9) as a K-endomorphism of K[X], and repre- 
sent it in the canonical basis (X*)i>o by an (infinite) matrix 
denoted M^. The submatrix of consisting of its first 
r -I- 1 > 1 rows and c + 1 > 1 columns is denoted M^^.. 

Then, much like Algorithm Mu\e in §3.1, our new algo- 
rithm MulWeyl in Fig. 4 relies on the key observation that 
an operator P £ K.[X] {d) of bidegree (d, r) is uniquely deter- 
mined by the submatrix M^^ of ■ This key fact is proved 
in Theorem 4 below. The principle of the algorithm is given 
in Fig. 3, where evaluation and interpolation are performed 
by truncated-series products. In the case of (n, n)a, the cor- 

B A BA 



Evaluation mod X''g+^ | | Evaluation | | Interpolation 




Figure 3: Evaluation-Interpolation w.r.t. d. 

responding matrices become A/^ 3„, M^ 2nj a^nd M2n,2: 

Theorem 3 Algorithm MulWeyl is correct and uses 
MM(dc + l,dA+rc + 1, rc + 1) + 0{{dc + rcf) ops. 



MulWeyl(B, A) 

Input: A,B € K[X]{d). 
Output: their product C = BA. 

1. Construct the matrices MaA+rc,rc ^dcd^+rc- 

2. Compute the product Mi^^aA+rc^dA+rc,rc- 

3. Recover C from the product in Step 2. 



Figure 4: Product of differential operators in d. 



Proof. By the definition of the matrix M^^, the matrices 
constructed in Step 1 are associated to the linear map which 
sends / G K[X]<rp to A{f) in M.[X]<dj^+rc and to the lin- 
ear map which sends / € K[X]<d^+rp to B{f ) mod X'''^'^^ 
in M.[X]<dc- Therefore, the product at Step 2 delivers the 
modX'^c'+i, 0<i<rc- The (dc + l) x (rc + 1) 
matrix computed is thus equal to M^^,,^. This is summa- 



rized in the identity M^^^a 

which the structure of zeros is given in Fig. 3. The inter 
polation of Step 3 relies on Theorem 4 below, which shows 
that C — BA is fully and uniquely determined by ^.^ . 
This terminates the correctness proof. The claimed com- 
plexity derives immediately from Propositions 2 and 3 that 
are proved in the next subsections. □ 

4.2.1 Interpolation theorem 

We now state the main interpolation result, which we 
prove after recalling a useful filtration on VK = K[X]{9). 

Theorem 4 For d,r aN, let Wd,r denote its K-subspace 

Wd,r^{PeW : deg^(P) < d, degg(P) < r }. 
Then, an isomorphism is given by the lK-/inear map 



EvOp, 



d,r 



Wd,r 
P 



]g{d+l)x(r+l) 



ML 



In order to prove Theorem 4, we use the filtration on W 
defined by the weights 1 on X and —1 on d. The decompo- 
sition into homogeneous components of any P £ Wd,r only 
involves weights between — r and d. It actually admits a 
special form, to be exploited later, which is described now. 

Lemma 5 The homogeneous decomposition of P £ Wd,r is 

r d 

P = Y, £-^iX^)^' + ^ X'£,{Xd), 

1=1 i=0 

where the £i 's and £-i 's are polynomials of degree at most 
/ii := min(d — i, r) and := min(r — i, d), respectively. 

Proof. Let P be j PijX^ d\ Then P decomposes as 

the sum ofj^iyj PrAX'd^)d'-^ and of Ei<,- 

Here pij is zero if i > r or if j > d, therefore P is equal to 

^ {s2p,+.,,x^d'\ d^ + J2x' (f2p^,t+a'd'] . (9) 

s = l \i=0 / t = \i=0 / 

Since any X^d^ can be written as a polynomial of degree i 
in Xd, the conclusion follows by expressing each parenthesis 
in (9) as a polynomial in Xd. □ 



Proof of Th. 4. Since dimxVKd.r is dimsK^'^+^^'^t'^+i), 
it suffices to show that EvOp^ ^ is injective. Let P in Wd,r be 
such that = 0' equivalently P{X'') mod = 

for all < fc < r. The decomposition of P € Wd,r in 
Lemma 5 enables one to evaluate it easily at X'' for k < r: 

P{X')=Y^j^,£-.(k-i)X'-^ + Y.Hk)X'+\ (10) 

Since P(X'=) mod X'^+'^ = for fc < r, Eq. (10) implies: 

• £i(k) = a < i < d, < k < r, and k + i < d, 

• £-i{k - i) = ii 1 < i < k, < k < r, and k - i < d. 

These equalities show that £{(0), . . . , (min(ci — i,r)) are 
zero for < i < d and that £_i(0), . . . , i^i (min(r — i, d)) are 
zero for 1 < i < r. Finally, Lagrange interpolation and the 
degree bounds in Lemma .5 imply that all the polynomials 
£i and £^i are identically zero. Thus, P is 0. □ 

A direct use of the ideas of this subsection would now end 
the proof of Theorem 3; the corresponding algorithm would 
first compute the polynomials £i and £-i, before evaluating 
them on 0, 1, . . .. By the following next two subsections, we 
shall propose a better solution, avoiding a logarithmic factor 
and hiding a smaller constant in the 0{-) term. 

4.2.2 Evaluation step 

Here we focus on Step 1 of Algorithm MulWeyl, which is an 
instance of the task of computing the matrix A/^ „ for given 

P = X]i=o X]j=oPij^"'^' in ^ ^nd integers m > d,n > r. 
The announced better approach makes use of Algorithm Eva I 
in Fig. 5, which is based on the following observation: Let 
< fc < n. Then we have the identities 



min(r,fc) d 



k] 



j=0 3=0 



{k 



tX 



k-\-j — i 



min(r,d — £, A;) 



E 



miii(7',fc) i — max(0, — £) 



X' 



Therefore, for — r < £ < d and < fc < n, the coeffi- 
cient {Si)k of X*" in the polynomial product 



miii(r,(i- 



E E 



X' 



(11) 



^i=max(0,-^) ' ^j = 

gives the coeflicient of in (fc! X*) "^P(X'=). Thus the 
coefficients {Si)k for max(0, —£) < k < min(m — £,n) of Si 
are, up to factorials, the coefficients on a certain diagonal of 
the matrix „, the other diagonals of M^.n being zero. 

Proposition 2 Algorithm Eval computes „ in M{mn) + 
0{mn) ops. 

Proof. The series exp(X) mod X"+^ and the factorials 
1, . . . , n! are computed by recurrence relations in 0{n) ops. 
The computation of Se can be done in M{se) for the size si 
of the corresponding diagonal of A/^_„. Summing over £ and 
appealing to properties of M leads to Yle M(s^) < M(5^^ si) 
< M(mn) + 0{mn), then to the announced complexity. □ 



Eval(P) 

Input: P £ Wd,r, m > d, n > r. 
Output: M^^„. 

1. For each -r < t < d, compute 5*^ mod 

by using Eq. (11). 

2. Initialize M to be an (m + 1) x (n + 1) zero matrix. 

3. For — r < £ < d and max(0, —£) < k < min(?n — £,n), 

Mk+e.k ■■= k\ {Se)k. 



Figure 5: Evaluation in K[X]{d). 

4.2.3 Interpolation step 

Given a (d + 1) x (r + 1) matrix M, Step 3 of Algorithm 
MulWeyl computes the only operator P £ Wd,r satisfying 
MJ^^ — M. This is done by inverting Eq. (11). The resulting 
algorithm is described in Fig. 6. A similar analysis to that 
of algorithm Eva I leads to the estimate in Proposition 3. 



lnterpol(A/) 

Input: M G k(''+i)'<''-+i). 

Output: P G Wd,r such that M^,. = M. 

1. Divide the fcth column of M by A;!. 

2. For each —r<£<d, compute the product Te — 

(min(d-f,r) \ 
Y, Mk+e,kX'' exp(-X) mod 
fc^max(0,-^) / 
r min(d-— i.r) 

3. Return ^ ^ {Te),X^+'d\ 

i — O I—— min(i.r) 



Figure 6: Interpolation in ]K[X](9). 



Proposition 3 Interpol computes P in M(dr) + 0{dr) ops. 

4.3 Comparison of algorithms for ( 

Algorithms Mulg in Fig. 2 and MulWeyl in Fig. 4 fol- 
low the same scheme: construction of evaluation matrices 
associated to A and B; product of these matrices; recon- 
struction of C by interpolation from it. But they differ 
in the way to do this, and MulWeyl can be viewed as an 
improvement on Mulg: the matrices computed by MulWeyl 
are submatrices of and in Algorithm Mula, as will 
be proved in [2]. Taking accurate sizes into account for 
{n,n)g, the dominant matrix-product problem drops from 
{6n + l,4n + 1, 2n + 1) to (2n -|- 1, 3n + 1, 2n + 1). Esti- 
mate (1) yields the number 12 in the last column of Table 1. 
Observing that the product at Step 2 of MulWeyl reduces to 
one instance of (2n-|-l, 2n-|-l, 2n-|-l) and one of (n, n, n), and 
appealing to Strassen's formula again, we obtain 7-1-1 = 8 
block products, as given on the last row of Table 1. 

5. PRODUCT IN CHARACTERISTIC >o 

As already pointed out, the evaluation-interpolation algo- 
rithms of Sections 3 and 4 remain valid when the character- 
istic p of K is positive and sufflciently large, but they fail to 
work in small characteristic. For instance, MulWeyl solves 
Problem {n,n)g for characteristic p > 3n. 



In this section, we provide an algorithm of different nature 
which proves that, in characteristic p, the product of two 
operators of bidegree (n, n) either in K.[X] {9} or in K[Jf ] (d) 
can be computed in 0{pn^) ops. For small p, this result is 
nearly optimal, since it is softly linear in the output size. 

Up to Oin?) additional ops., multiplication in K[Jt](9) 
can be reduced to multiplication in K[Jt](^), as explained 
in §3.3. Thus, we focus on Problem (n, n)g. 

Our algorithm Mulg^p for multiplication in K[X] {9) is given 
in Fig. 7. It is based on the key fact that 9 and X^ commute 
in characteristic p. This is used in Step 2, which reduces the 
product in ]K[X] {9) to several products in the commutative 
polynomial ring K[X*',6']. 



Figure 7: Product of differential operators in 9 over 
a field of positive characteristic. 

We now describe proper algorithmic choices that perform 
each step of Mulg^p in nearly optimal complexity. 

Step 1 first rewrites A as Yfv^l ^v{X^ ,9) and B as 
X"B„(Xf, 61), where Q<u,v< p-1 are poly- 

nomials in ]K[X'', 6*] of bidegree at most ( [n/pj , n); this costs 
no ops. The commutation 9-' X^ — X"" (9 + «)■' then enables 
one to rewrite A as J]^^,, X(XP, 6»)X^ where A^{X^ ,9) 
is AviX"^ ,6 — v). Thus, each A^ is obtained by computing 
[n/pj -f 1 shifts of polynomials of degree at most n. By 
Lemma 1(a), this results in 0{n M(n) log n) ops. for Step 1. 

Each product in Step 2 involves polynomials in ]K[X*',6] 
of bidegree at most {\n/p\,n). Thus using Lemma 1(d), 
Step 2 is performed in ©(p^ M(nVp)) Q 0(pM(n^)) ops. 
Note that Cu,v{X,Y) has bidegree at most (2[n/pJ , 2n). 

To perform Step 3, each Cu,v{X^ ,9)X'" is first rewrit- 
ten as X"" Cu,v{X^ ,9) by computing 2 [n/pj -f 1 shifts of 
polynomials of degree at most 2n. This can be done in 
O {pn M (n) log n) ops. Finally, O(pn^) ops. are sufficient to 
put C = ^" J2lZl X''Cu,4X'',9) in canonical form. 

Summarizing, we have just proved: 

Theorem 5 Let K be a field of characteristic p and let D 
be one of the operators d,6. Then, two operators of bide- 
gree {n,n) in K[X]{D) can be multiplied in C'(pM(n^) -|- 
pn M(n) logn) ops., thus in 0{pn^) ops. when FFT is used. 

6. EXPERIMENTS 

Table 2 provides timings of calculations in magma by im- 
plementations of several algorithms and algorithmic vari- 
ants. Each row corresponds to calculations on the same 
pair of randomly generated operators in bidegree (n, n), for 
n — 10-2'^. Coefficients are taken randomly from 'L/p'L 
when p > 0, the prime used being pi = 65521 (largest prime 



Mule,p(B,yl) 

Input: A,B e K[X]{9), with char(]K) = p > 0. 
Output: their product C — BA. 

1. Rewrite A and B as A = Y,lZlA4X'',9)X'" 

and B = E^:;'o^"54^^''^)• 

2. Compute the commutative bivariate products 

Cu,v = BuAy, for < u,w < p. 

3. Write J2l~y^^oX''Cu,4XP,9)X'' in canonical form; 

return it. 



to fit on 16 bits) and P2 = 4294967291 (largest prime to 
fit on 32 bits). When p = 0, computations are performed 
over Q, with random integer input coefficients on 16 bits. 



p 


k 


S 


B 


BZ 


vdH 


Iter 


Tak 


Rec 


Int 


BZI 


vdHI 


Pi 


3 


0.25 


0.26 


0.25 


0.39 


0.32 


1.23 


0.01 


0.64 


5.22 


59.8 


Pi 


4 


0.95 


0.97 


0.95 


1.68 


4.13 


12.09 


0.03 


4.37 


35.0 


418 


Pi 


c: 





All 


A '\A 


Sin 






n 9(1 
u. 


•^n 9 


940 




pi 


6 


21.4 


21.1 


22.2 


45.1 


397 


1407 


1.56 


209 


1692 


oo 


Pi 


7 


107 


105 


104 


275 


oo 


oo 


13.3 


1507 


oo 


oo 


P2 


3 


0.50 


0.63 


0.62 


1.08 


2.25 


5.61 


0.08 


1.10 


8.00 


82.2 


P2 


4 


2.24 


2.66 


2.68 


4.52 


19.07 


67.73 


0.35 


9.22 


58.2 


602 


P2 


5 


12.2 


14.5 


14.1 


24.4 


187 


926 


1.63 


75.6 


420 


oo 


P2 


6 


88.1 


111 


114 


172 


2604 


oo 


9.40 


770 


3146 


oo 


P2 


7 


1961 


2452 


2633 


oo 


oo 


oo 


59.1 


oo 


oo 


oo 





3 


9.93 


12.0 


11.3 


28.4 


6.99 


24.3 


0.07 


0.93 


16.9 


309 





4 


128 


164 


164 


498 


118 


725 


0.27 


6.89 


204 


oo 





5 


2164 


2737 


2725 


oo 


2492 


oo 


4.37 


51.4 


3172 


oo 



Table 2: Timings on input of bidegree (10 ■ 2*^, 10 • 2*=). 



The calculations were performed on a Power Mac G5 with 
two CPUs at 2.7 GHz, 512 kB of L2 Cache per CPU, 2.5 GB 
of memory, and a bus of speed 1.35 GHz. The system used 
was Mac OS X 10.4.10, running Magma V2. 13-15. Compu- 
tations killed after one hour are marked oo. 

We provide several variants of our algorithm (S, B, and 
BZ), as well as various others: S: direct call to magma's 
matrix multiplication in order to compute M2n,i,nM^n,2n \ B 
and BZ: block decomposition into nxn matrices before call- 
ing magma's matrix multiplication on, respectively, 11 block 
products (using Strassen's algorithm) and by 8 block prod- 
ucts (taking the nullity of 2 blocks into account as well); 
vdH: Van der Hoeven's algorithm, as described in [11], 
and optimized as much as possible as the implementation S 
above; Iter and Tak: iterative formulas (2) and (5); Rec: 
magma's multiplication of a (2n+ 1) x (3?i-|- l)-matrix by a 
(3n + 1) X (2n -f l)-matrix, that is, essentially all the linear 
algebra performed in variant S (in practice, almost always in 
the cubic regime for the objects of interest); Int: fully inter- 
preted implementation of Strassen's product with cubic loop 
under a suitable threshold; BZI and vdHI: variants of the 
implementations BZ and vdH (with evaluation-interpolation 
steps improved) in which magma's product of matrices has 
been replaced with Int. 



p 




Pi 


Pi 


P2 


P2 


P2 











k 




3 


7 


3 


5 


7 


3 


4 


5 


LA 


0{MM{n)) 


4% 


13% 


17% 


16% 


39% 


36% 


41% 


52% 


PP 


0(nM(n)) 


13% 


25% 


23% 


23% 


18% 


36% 


33% 


24% 


OM 




38% 


36% 


30% 


27% 


11% 


7% 


6% 


5% 


lO 




46% 


27% 


30% 


33% 


32% 


21% 


20% 


19% 



Table 3: Fraction of time spent in matrix product 
(LA), polynomial products (PP), other matrix oper- 
ations (OM), and other interpreted operations (lO). 



Comparing the columns Rec and, for instance, S, shows 
that linear algebra does not take the main part of the calcu- 
lation time, although its theoretical complexity dominates. 
In this regard, we have been very cautious in our implemen- 
tation to avoid any interpreted quadratic loops. Still, the 
result is that those quadratic tasks dominate the computa- 
tion time. Details are given in Table 3. The conclusion is 
that having implemented the algorithms in an interpreted 



language tends to parasitize the benchmarks. For compari- 
son sake, we have also added timings for variants BZI and 
vdHI that use an interpreted matrix product. They both 
show the growth expected in theory, as well as the ratio 
from 8 to 96 announced in Table 1. 

7. CONCLUSIONS, FUTURE WORK 

Because of space limitation, various extensions could not 
be covered here. More results on the complexity of non- 
commutative multiplication of skew polynomials will be pre- 
sented in an upcoming extended version [2]. Topics like 
multiplication of skew polynomials with unbalanced degrees 
and orders, or with sparse support, will be treated there. 
The case of rational (instead of polynomial) coefficients will 
also be considered. The methods of this article extend to 
multiplication of more general skew polynomials, in one or 
several variables, including for instance g-recurrences and 
partial differential operators. 

The constants in Table 1 are all somewhat pessimistic. 
Tighter bounds can be obtained by, on the one hand, re- 
laxing the naive assumption (1), on the other hand, taking 
advantage of the special shapes (banded, trapezoidal, etc) 
of the various matrices. 

We also plan to provide a lower-level implementation. 
Hopefully, the timings would then reflect the theoretical re- 
sults even better and will be close to those of naked matrix 
products. 
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